out-distribution sample
Provably Robust Detection of Out-of-distribution Data (almost) for free
Meinke, Alexander, Bitterwolf, Julian, Hein, Matthias
When applying machine learning in safety-critical systems, a reliable assessment of the uncertainy of a classifier is required. However, deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data and even if trained to be non-confident on OOD data one can still adversarially manipulate OOD data so that the classifer again assigns high confidence to the manipulated samples. In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier. In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data. Moreover, due to the particular construction our classifier provably avoids the asymptotic overconfidence problem of standard neural networks.
Controlling Over-generalization and its Effect on Adversarial Examples Generation and Detection
Abbasi, Mahdieh, Rajabi, Arezoo, Mozafari, Azadeh Sadat, Bobba, Rakesh B., Gagne, Christian
Convolutional Neural Networks (CNNs) allowed improving the state-of-the-art for many vision applications. However, naive CNNs suffer from two serious issues: vulnerability to adversarial examples and making incorrect but confident predictions for out-distribution samples. In this paper, we draw a connection between these two issues of CNNs through over-generalization. We reveal an augmented CNN (an extra output class added) as a simple yet effective end-to-end approach has the capacity for controlling over-generalization. We demonstrate training an augmented CNN on only a properly selected natural out-distribution dataset and interpolated samples empowers it to classify a wide range of unseen out-distribution samples as dustbin. Meanwhile, its misclassification rates on a broad spectrum of well-known black-box adversaries drop drastically as it classifies a portion of adversaries as dustbin class (rejection option) while correctly classifies some of the remaining. However, such an augmented CNN is never trained with any types of adversaries. Finally, generation of white-box adversarial attacks using augmented CNNs can be harder as the attack algorithms have to avoid dustbin regions for generating actual adversaries.
Out-distribution training confers robustness to deep neural networks
Abbasi, Mahdieh, Gagné, Christian
The easiness at which adversarial instances can be generated in deep neural networks raises some fundamental questions on their functioning and concerns on their use in critical systems. In this paper, we draw a connection between over-generalization and adversaries: a possible cause of adversaries lies in models designed to make decisions all over the input space, leading to inappropriate high-confidence decisions in parts of the input space not represented in the training set. We empirically show an augmented neural network, which is not trained on any types of adversaries, can increase the robustness by detecting black-box one-step adversaries, i.e. assimilated to out-distribution samples, and making generation of white-box one-step adversaries harder.